Skip to content

Add rocBLAS yaml files for gfx1151#699

Merged
amcamd merged 4 commits into
developfrom
import/develop/amd-mtrifuno_rocBLAS/complete_gfx1151_support
Jul 17, 2025
Merged

Add rocBLAS yaml files for gfx1151#699
amcamd merged 4 commits into
developfrom
import/develop/amd-mtrifuno_rocBLAS/complete_gfx1151_support

Conversation

@amd-mtrifuno
Copy link
Copy Markdown
Contributor

@amd-mtrifuno amd-mtrifuno commented Jul 16, 2025

  • Add Strix Halo yaml files that are copy of Navi33 yaml files (the same changes added for Strix Point)

  • Need to update tensile_tag.txt after PR with Tensile changes is merged in rocm-libraries and automatically in ROCm/Tensile (Complete gfx1151 Tensile support #696)

TorreZuk
TorreZuk previously approved these changes Jul 16, 2025
Copy link
Copy Markdown
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need Tensile review and so will tensile PR land first and tensile_tag commit be added?

amd-mtrifuno added a commit that referenced this pull request Jul 16, 2025
- Add missing changes for gfx1151
- rocBLAS PR with gfx1151 yaml files that requires Tensile changes:
#699
assistant-librarian Bot pushed a commit to ROCm/Tensile that referenced this pull request Jul 16, 2025
Complete gfx1151 Tensile support (#696)

- Add missing changes for gfx1151
- rocBLAS PR with gfx1151 yaml files that requires Tensile changes:
ROCm/rocm-libraries#699
amd-mtrifuno added a commit that referenced this pull request Jul 16, 2025
- Add missing changes for gfx1151
- rocBLAS PR with gfx1151 yaml files that requires Tensile changes:
#699
@amd-mtrifuno
Copy link
Copy Markdown
Contributor Author

Need Tensile review and so will tensile PR land first and tensile_tag commit be added?

Tensile PR merged, I updated tensile_tag.

@TorreZuk
Copy link
Copy Markdown
Contributor

@yoichiyoshida or @babakpst can you also review. All tests appear to pass.

@amcamd amcamd merged commit 78d9061 into develop Jul 17, 2025
11 of 12 checks passed
@amcamd amcamd deleted the import/develop/amd-mtrifuno_rocBLAS/complete_gfx1151_support branch July 17, 2025 16:56
assistant-librarian Bot pushed a commit to ROCm/rocBLAS that referenced this pull request Jul 17, 2025
Add rocBLAS yaml files for gfx1151 (#699)

- Add Strix Halo yaml files that are copy of Navi33 yaml files (the same
changes added for Strix Point)

- Need to update tensile_tag.txt after PR with Tensile changes is merged
in rocm-libraries and automatically in ROCm/Tensile
(ROCm/rocm-libraries#696)

---------

Co-authored-by: Torre Zuk <42548444+TorreZuk@users.noreply.github.com>
amd-mtrifuno added a commit that referenced this pull request Jul 18, 2025
- Add Strix Halo yaml files that are copy of Navi33 yaml files (the same
changes added for Strix Point)

- Need to update tensile_tag.txt after PR with Tensile changes is merged (#708)

---------

Co-authored-by: Torre Zuk <42548444+TorreZuk@users.noreply.github.com>
@darkbasic
Copy link
Copy Markdown

Did you manage to backport this into 7.0.1?
https://gitlab.freedesktop.org/drm/amd/-/issues/4321#note_3048205

@amd-mtrifuno
Copy link
Copy Markdown
Contributor Author

Did you manage to backport this into 7.0.1? https://gitlab.freedesktop.org/drm/amd/-/issues/4321#note_3048205

Yes, PR (#744) has been merged into release/rocm-rel-7.0 branch.

DDEle added a commit that referenced this pull request Apr 21, 2026
## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job: 

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
aledudek pushed a commit that referenced this pull request May 20, 2026
## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job: 

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants